Collocation Extraction Using Web Statistics

نویسندگان

  • Hsin-Hsi Chen
  • Yi-Cheng Yu
  • Chih-Long Lin
چکیده

This paper mines collocations from two different web usage corpora, NTU proxy log and TTS search log. The precisions for NTU and TTS test data are 61.76% and 57.50%, respectively, by human judgment for 2% sampling of extracted collocations. For automatic evaluation, we submit extracted collocation to Google search engine, and the resulting page counts are used to compute the mutual information of the collocation. Experimental results show that total 43.27% and 42.65% of collocations mined from NTU and TTS corpora passed the examination of MIs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Collocation Statistics in Information Extraction

Our main objective in participating MUC-7 is to investigate and experiment with the use of collocation statistics in information extraction. A collocation is a habitual word combination, such as \weather a storm", \ le a lawsuit", and \the falling yen". Collocation statistics refers to the frequency counts of the collocational relations extracted from a parsed corpus. For example, out of 6577 i...

متن کامل

A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a specific empirical collocation extraction task where collocations are extracted b...

متن کامل

Extracting Academic Subjects Semantic Relations Using Collocations

The paper presents approach to analyze semantic content of academic subjects and its internal relations using statistically-based techniques for collocation extraction from large electronic educational text corpus. It offers a survey and analysis of some related corpus-based approaches to extract conceptual relations used for educational purpose and presents a technique for semantic search of c...

متن کامل

Exploratory Search on the Mobile Web

We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on a tablet computer, i.e. an Apple iPad, and on a mobile device, i.e. Apple iPhone or iPod touch. The topics are extracted from web snippets which are determined by a standard search engine. We consider the extraction of topics as a specific ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004